-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file create data appending #1163
base: dev
Are you sure you want to change the base?
Add file create data appending #1163
Conversation
nwbfile = writer.read() | ||
|
||
# added one more entry as opened read/write | ||
self.assertEqual(len(nwbfile.file_create_date), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also test the second round-trip, i.e., close the file and re-open it in read-mode and confirm that the change to file_create_date
is still present. I am concerned that the file_create_date
dataset is not chunked and therefore cannot grow, or the change is not saved for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rly I've pushed something but I need to review that again tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rly You were right. The additional entry does not reach the file.
h5dump -A unittest_file_create_date.nwb | grep -A 10 file_create_date
HDF5 "unittest_file_create_date.nwb" {
GROUP "/" {
ATTRIBUTE ".specloc" {
DATATYPE H5T_REFERENCE { H5T_STD_REF_OBJECT }
DATASPACE SCALAR
DATA {
(0): GROUP 6512 /specifications
}
}
ATTRIBUTE "namespace" {
DATATYPE H5T_STRING {
--
DATASET "file_create_date" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
}
GROUP "general" {
DATASET "institution" {
Questions:
- How can I fix that?
- How can I require a newer hdmf version to that the tests pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix that, the dataset has to be chunked. @ajtritt -- is there a way to chunk only the NWBFile.file_create_date
dataset? I am also in favor of blanket chunking all datasets in NWB...
To use changes in a newer hdmf version, the changes must have been released on PyPI. The recent "mode" function addition isn't released yet, but we could do that this week if these issues are pressing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new hdmf would be nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I force the stored dataset to be chunked?
I tried
diff --git a/src/pynwb/io/file.py b/src/pynwb/io/file.py
index 1ddeb310..2ec342d9 100644
--- a/src/pynwb/io/file.py
+++ b/src/pynwb/io/file.py
@@ -3,6 +3,7 @@ from hdmf.build import ObjectMapper
from .. import register_map
from ..file import NWBFile, Subject
from ..core import ScratchData
+from hdmf.backends.hdf5.h5_utils import H5DataIO
@register_map(NWBFile)
@@ -156,6 +157,10 @@ class NWBFileMap(ObjectMapper):
dates = list(map(dateutil_parse, datestr))
return dates
+ @ObjectMapper.object_attr('file_create_date')
+ def file_create_date_obj_attr(self, container, manager):
+ return H5DataIO(container.file_create_date, chunks=True)
+
@ObjectMapper.constructor_arg('file_name')
def name(self, builder, manager):
return builder.name
but that does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I have the right solution for you, but a couple of thoughts:
- I think it is important to expose this behavior explicitly to user. While doing this implicitly behind the scenes is convenient, it make the process intransparent.
- We should try not to mix front-end and backend functionality, i.e, using the HDF5-specific H5DataIO in the ObjectMapper (or Container) is problematic as this will not translate to other backends.
- This issue also has come up with DynamicTable at some point, because we wanted all columns of the table to be chunked so they can be extended. @rly @ajtritt was that issue solved and would that same strategy apply here?
Ultimately, I think the core issue is that we want specific datasets to be written in a resizable fashion (so they can grow). In the case of HDF5 that requires chunking but for other backends that may or may not be the case. In that vain, I think what we may need is a generic (backend-agnostic) way to provide write-hints, which in this case would say "make this dataset resizeable". I'm wondering whether we could add I/O hints on the builder for this and in the object-mapper a way to ask for I/O hints for fields. It would then be up to the backend to decide what to do with those I/O hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oruebel It totally agree that a HDF5 specific solution is the wrong thing to do here. But up to now I don't have any solution at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to work on this again.
I think it is important to expose this behavior explicitly to user. While doing this implicitly behind the scenes is convenient, it make the process intransparent.
What implicit part are you concerned about? The "making the dataset chunked" or "adding new entries in the file_create_dataset"? The latter is what nwb-schema says how file_create_dataset should be handled.
Ultimately, I think the core issue is that we want specific datasets to be written in a resizable fashion (so they can grow). In the case of HDF5 that requires chunking but for other backends that may or may not be the case. In that vain, I think what we may need is a generic (backend-agnostic) way to provide write-hints, which in this case would say "make this dataset resizeable". I'm wondering whether we could add I/O hints on the builder for this and in the object-mapper a way to ask for I/O hints for fields. It would then be up to the backend to decide what to do with those I/O hints.
Yes that would be required. Of course my above hack is a hack and can not be merged as is, but I first wanted to get something working and then make the solution generalizable. I just saw that hdmf.builders.DatasetBuilder has a chunks
argument as well.
I seem to not understand how the object mappers work. According to https://pynwb.readthedocs.io/en/stable/overview_software_architecture.html?highlight=architecture#objectmapper I would think that
$ git diff .
diff --git a/src/pynwb/io/file.py b/src/pynwb/io/file.py
index 2c629ab7..a7057941 100644
--- a/src/pynwb/io/file.py
+++ b/src/pynwb/io/file.py
@@ -3,7 +3,7 @@ from hdmf.build import ObjectMapper
from .. import register_map
from ..file import NWBFile, Subject
from ..core import ScratchData
-
+from hdmf.build import DatasetBuilder
@register_map(NWBFile)
class NWBFileMap(ObjectMapper):
@@ -152,6 +152,10 @@ class NWBFileMap(ObjectMapper):
date = dateutil_parse(datestr)
return date
+ @ObjectMapper.object_attr('file_create_date')
+ def file_create_date_obj_attr(self, container, manager):
+ return DatasetBuilder('file_create_date', data=container.file_create_date, chunks=True)
+
@ObjectMapper.constructor_arg('file_create_date')
def dateconversion_list(self, builder, manager):
datestr = builder.get('file_create_date').data
should work, but it doesn't. Any hints?
e6459da
to
9137f2a
Compare
We need to find a way to tell pynwb that certain datasets in HDF5 need to be written as chunked by default. Only then they are appendable. I don't know how to do that. |
@t-b ah, ok. Sounds like a job for |
Using an if/elif chain is easier to understand.
… load The file_create_date entry holds according to [1] A record of the date the file was created and of subsequent modifications. But until now we never added additional entries to file_create_date. We now do that when the file is not opened read-only. [1]: https://nwb-schema.readthedocs.io/en/latest/format.html#nwb-n-file
3d70398
to
3fb6358
Compare
Close #990.
Requires hdmf-dev/hdmf#280.